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Abstract: This paper addresses a vision-based cooperative search for multiple mobile 
ground targets by a group of unmanned aerial vehicles (UAVs) with limited sensing and 
communication capabilities. The airborne camera on each UAV has a limited field of view 
and its target discriminability varies as a function of altitude. First, by dividing the whole 
surveillance region into cells, a probability map can be formed for each UAV indicating the 
probability of target existence within each cell. Then, we propose a distributed probability 
map updating model which includes the fusion of measurement information, information 
sharing among neighboring agents, information decay and transmission due to environmental 
changes such as the target movement. Furthermore, we formulate the target search problem 
as a multi-agent cooperative coverage control problem by optimizing the collective coverage 
area and the detection performance. The proposed map updating model and the cooperative 
control scheme are distributed, i.e., assuming that each agent only communicates with 
its neighbors within its communication range. Finally, the effectiveness of the proposed 
algorithms is illustrated by simulation. 

Keywords: UAV; multi-agent network; target search; cooperative control 
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1. Introduction 

With the fast development of high resolution imaging devices and processing technologies, unmanned 
aerial vehicles (UAVs) with air-borne cameras are increasingly employed in civil and military 
applications such as environmental monitoring, battlefield surveillance and map building, where 
ground-target search is one of the major applications [1,2]. Target tracking and search have been one of 
the most popular utilizations of UAVs [3,4]. The conventional method for target search by UAVs in a 
closed region divides the whole surveillance region into cells, and associates each cell with a probability 
or confidence of target existence in the cell which constitutes a probability map for the whole region [5,6] . 

In [7], an online planning and control method is proposed for cooperative search by a group of UAVs, 
where each agent keeps an individual probability map for the whole region updated according to the 
Dempster-Shafer theory. A path planning algorithm is designed by using the obtained measurement 
information, which requires each agent to directly communicate with all other agents. In [8], target 
detection is considered as part of an integrated mission including coverage control and data collection 
as parallel tasks for multi-agent networks. The coverage control method aims to maximize the 
joint detection probability of random events and the probability of target existence is updated by 
measurements based on the Bayesian rule. However, only the measurement information of direct 
neighbors is exchanged, which makes it difficult to obtain the target information of the whole surveillance 
region. In [9], a decentralized gradient-based control strategy is proposed for multiple autonomous 
mobile sensor agents searching for targets of interest by minimizing the joint team probability of no 
detection within action horizon based on range detection sensing model. However, each agent is required 
to collect detection information from all other agents. In [10], a decentralized search algorithm is 
developed which includes a two-step updating procedure for the probability maps. Each agent first 
obtains observations over the cells within its sensing region and updates its individual probability map 
by the Bayesian rule. Then, each agent transmits its individual probability map to its neighbors for map 
fusion. This algorithm is distributed and full network connectivity is not required. However, the lack 
of information correlation makes the map fusion difficult and only a heuristic fusion method is given 
in [10], the performance of which has not been analyzed. In our recent work [1 1], a distributed iterative 
map updating model is proposed to fuse the information from measurements and the maps of neighbors 
based on a logarithmic transformation of the Bayesian rule. Through this, the nonlinear Bayesian update 
is replaced by a linear one which simplifies the computation. The convergence speed of individual 
probability map of an agent is also analyzed under fixed detection and false alarm probabilities for the 
search of static targets. 

The cooperative control is an important task for efficient target search by a group of UAVs. Compared 
with the centralized control algorithms, distributed control algorithms are more robust to accidental 
failures of UAVs and breaks of communication links [12]. In [13], a distributed multi-agent coverage 
control method is proposed based on a given sensing performance function related to the distance to 
robots and gradient descent algorithms are designed for a class of utility functions to optimize the 
coverage and sensing performance. In [14], a distributed, adaptive control law is developed to achieve 
an optimal sensing configuration for a network of mobile robots which obtain sensory information of 
a static environment and exchange their estimates of the environment with neighbors. In [15], a 
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three dimensional distributed control strategy is proposed to deploy hovering robots with downward 
facing cameras to collectively monitor an environment. A new optimization criterion is defined as the 
information obtained by each pixel of a camera. In [16], a dynamic awareness model is proposed to 
control a multi- vehicle sensor network with intermittent communications. The state of awareness of 
each individual vehicle is updated by its own sensing model and sharing information with its neighbors. 
However, none of the coverage control schemes mentioned above has considered the detection results 
of target existence which may affect UAVs' movement decisions in target search. Moreover, there are 
very few works addressing the issue of distributed vision-based cooperative search for multiple mobile 
targets with probabilistic detections. 

In this paper, we investigate the vision-based cooperative search for multiple ground mobile targets 
by a group of UAVs with limited sensing and communication capabilities. The main contribution 
of this paper is that a distributed strategy of information fusion and cooperative control is proposed 
for searching multiple mobile targets using multi-agent networks based on probabilistic detections. 
In addition, the time-varying detection and false alarm probabilities are considered which are due 
to the varying altitudes of the agents with 3-dimensional dynamics. Each agent under our search 
strategy shares local target information and controls its own behavior in a distributed manner. Based 
on the probability map updating model proposed in [11], we generalize the model by considering the 
information decay and transmission between cells due to environmental changes such as the target 
movement. The influence of the time-varying detection probability on the update of probability maps 
due to the three-dimensional UAV dynamics is also analyzed. Then, a coverage optimization problem is 
formulated to balance the coverage area and detection performance. The proposed map updating model 
and cooperative control scheme are distributed, i.e., each agent only communicates with the agents within 
its communication range. 

This paper is organized as follows: Section 2 describes the basic notations and assumptions used in 
this paper. Section 3 presents the probability map updates by measurements and information sharing 
with time- varying detection probabilities. In Section 4, a three-dimensional coverage control method is 
presented for target search. Simulation results are shown in Section 5, and the conclusions are drawn in 
Section 6. 

2. Basic Definitions and Assumptions 

The surveillance region O £ I 2 is assumed to be on a plane ground and has been uniformly divided 
into a set of cells of the same size. We assume that all UAVs (or agents) use the same global Cartesian 
coordinate system and the position of each agent is denoted as = [cj k , hi t k\ T £ for agent i 
(i = 1, 2, ■ ■ ■ , N) at time k (as shown in Figure la), where c i: k € IR 2 is the planar coordinate of its 
projection on O, h^u G IR is the altitude of the agent above O, N is the total number of agents and "T" 
denotes the transpose operation. Each agent is assumed to have access to its own position at any time. 
Each cell in the surveillance region is associated with a probability or confidence of target existence 
within the cell which is modeled using the Bernoulli distribution, i.e., 6 9t k = 1 (a target is present) with 
probability p (Q 9y k = 1) and 9 g:k = 0 (no target is present) with probability 1 — P (9 g = 1) for agent i 
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and cell g at time k, where g E M 2 is the location of the cell center in O. If more than one target are 
present within a cell, they are treated as one single target. 



Figure 1. Target search by multiple UAVs. (a) A network of UAVs; (b) Target image taken 
by an airborne camera. 




(a) (b) 

In this paper, we mainly discuss about the vision-based detections where each agent carries an 
airborne camera facing downward to surveillance region (as shown in Figure la). Each agent 
independently takes measurements Z^ g ^ k over the cells within its sensing region Ci )k at time k, where 

C ijk = {geO: \\g - Ci >k \\ ^ h i>k tan tp} 

and ||« || denotes the 2-norm for vectors. Each agent is assumed to have the same angle of field of view, 
half of which is denoted by <p. We also assume that the size of each cell is sufficiently small comparing 
with the size of so that we can ignore the boundary effect and roughly consider a cell as wholly 
within Cijfc if its center is within Ci ;k . Only two observation results are defined for each cell, Zi >g>k = 0 
or Z ii9ik = 1. For all cells, P {Z i)g>k = l\0 g>k = 1) = p i>k and P {Z i>g>k = l\6 g>k = 0) = q i>k are assumed 
to be known by agent i as the detection probability and false alarm probability respectively. 

The topology of the network of all agents at time k is modeled by an undirected graph Q k = (£ k , V). 
V = {1,2,..., N} is the vertex set and £ k = {{i, j} : i,j e V; \\fM,k — /Aj,fc|| ^ -^c} is the edge set, 
where each edge {i,j} is an unordered pair of distinct agents and R c is the communication range of each 
agent. The graph or the network is connected if for any two vertices i and j there exists a sequence of 
edges (a path) {i, u ± }, {u u z/ 2 }, . . . , {iV-i, ^n}, {v n J} in E k . Let J\f iik = {j G V\ {ij} G S k } U {i} 
denote the set of neighbors of agent i at time k where an agent is assumed to be a neighbor of itself. The 
degree (number of neighbors) of agent i at time k is denoted as d i:k = \JVi >k \. 

3. Probability Map Update 

3.1. Bayesian Update and Consensus-Based Map Fusion 

In [11], we proposed a cooperative control scheme for target search in multi-agent systems. In a 
group of UAVs, each agent i keeps an individual probability map Vi ;9>k of the whole region, where 
Pi, g ,k — Pi (9 g ,k = 1) and is updated by the Bayesain rule: 
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i,g,k 



P (Zi,g,k\@g,k ~ 1) ^g.fc-l 



P {Zi,g,k\9g,k ~ 1) Pi,g,k-1 + P {%i,g,k\Qg,k ~ 0) (1 — "Pi,g,fc-1 
Pi,kPi,g,k—l 



Pi,kPi,g,k-l + <?i,fc (1 ~ Pi,g,k-l) 
(1 - Pi,k) Pi,g,k-1 

(1 - Pi,k) Vi, g ,h-X + (1 - %k) (1 - Pi,g,k-l) 

Vi, g ,k-i otherwise 



if %i,g,k — 1 
if Zi,g,fc = 0 



(1) 



where 0 < 7\ g ,o < 1 and 1 > p i>k , q i>k > 0. For the cases with p i>k = 0 or 1 or q i)k = Oor 1, simplified 
conclusions can be obtained as shown in [11] and will not be considered in this paper. By letting 



Qi, 



i,g,k 



In 



1 



- 1 



i,g,k 



(2) 



we get the following transformation of Equation (1): 



Qi,g,k Qi,g,k—1 "Vi,g,k 



(3) 



where 



Vi g,k 



ln^ 

Pi,k 



if Zi 



i,g,k 



In i — 'Hid*, 7. , — 
0 otherwise 



(4) 



Keeping as the updated term instead of Vi,g,k simplifies the nonlinear update in Equation (1) into 
the linear one in Equation (3). For a group of UAVs, we let each agent i at time k first take measurements 
and transmit the measurements to its neighbors. After receiving the measurements from all its neighbors, 
Qi gjc is updated as follows: 

Hi n h = Qi n fc_1 + > Vi n h (5) 



'/.//. k Qi,g,k-l J r v j,g,k 



Then, each agent i transmits the updated Qi, g>k of the whole region to its neighbors for map fusion, 
which is given by: 

Qi,g,k — VJi,j,kHj t g t k (6) 



where w i;i>k = 1 - -^-^ , 
composed of Wij k can be defined as: 



^ for j e A/i,jfc 0 7^ and tfjj^ = 0 for j ^ Af itk . Then, a matrix 



[ w i,j,k\ NxN v> ^ 



iV) 



(7) 



which is a doubly stochastic matrix [17]. The communications of neighboring agents are assumed to be 
synchronized within a short time interval. Time synchronization in distributed networks is not the focus 
of this paper and has been addressed by many works [18-21]. 
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3.2. Time-Varying Detection Probability 

In [1 1], we only considered a 2-dimensional control scheme assuming that all agents move on a fixed 
plane parallel to the ground plane. However, in the real world, UAVs such as helicopters can change 
their altitudes according to their task requirements so as to enlarge their sensing area (here we only 
consider cameras with a fixed zooming level). Therefore, in this paper, we will consider the influence of 
3-dimensional dynamics of UAVs on the detection performance. 

For vision-based detection, the detection probability relies on the picture resolutions. Figure lb shows 
the basic imaging scheme by an airborne camera similar to the one given in [15,22]. In general, a 
desirable property for good target recognition is a "right" ratio between the size of the image and the 
size of the target, where "right" depends on the target type and the detection algorithm that is employed. 
To be more simplified, it can be assumed that the larger the image of a target in the picture (in terms of 
the number of occupied pixels) obtained by the UAV, the easier for the UAV to discriminate the target 
no matter what recognition method is used. Hence, we can model the target discriminability of a UAV 
as a function p proportional to the ratio between the size of a target image taken by the camera denoted 
by Stj and the size of one pixel denoted by Sp, i.e., p oc Here, we assume that all targets are of the 
same visual properties such as color, shape and size that are influential on target discriminability. It is 
also assumed that each camera has a fixed focal length so that we can only consider the change of p due 
to the variation of agent altitude. Then, by denoting the size of the projection of a target on the ground 
plane as 5* T , we can derive that: 

Sti St & 2 S*t _ 
poc— — = — — (8) 
jx op n op 

where h is the altitude of the UAV and b is the fixed distance between the image and the lens (as shown 
in Figure lb). In a multi-agent system, for the i-th agent at time k, we have p ik oc K S q • From 
Equation (8), we may get p i>k — >■ oo as h i)k — > 0. However, in reality, pi ik cannot be infinitely large 
and there should be an upper limit when h^ k is smaller than a threshold h. That is to say, the target 
discrimination ability will not be improved any more if a UAV is descending very close to the ground. 

The target discriminability determines the detection probability when a UAV is detecting the 
existence of targets within each cell under surveillance. It is natural to conceive that the detection 
probability Pi jk increases and the false alarm probability q iik decreases as p^ k increases. When the 
altitude of the UAV becomes larger than a threshold h, it runs out of its ability to discriminate any 
target from the background environment, which means that the detection result dose not rely on 
the true existence of the target any more. That is, if h^ k ^ h, we have P (Zi tg>k = 0\9 g = 1) = 

p(z^ k = i\e g = i) = p(z^ k = i\e g = o) = p(z ii9ik = i\e g = o), i.e., Pi , k = %k = 0.5. 

Generally, when h ik e [h,h\ (h< h), p iyk should be a monotonically increasing function of p ijk , or 
more explicitly, a monotonically decreasing function of h itk . Therefore, we assume the following 
detection probability model: 

{0.5 if hi ik ^ h 

fx {h i>k ) ifh< h ijk < h (9) 
p if 0 < h itk ^ h 
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where f{ (h ik ) < 0 for h^k G (h, h) and 1 > fi (h) = p > /i (/i) = 0.5. Similarly, we can assume the 
false alarm probability model as a monotonically increasing function of h^k- 

{0.5 if h ik ^ h 

f 2 (hi,*) ifh< h itk < h (10) 
q if 0 < h ik ^ h 

where f 2 (h iik ) > 0 for h ijk G (/i, /i), 0 < / 2 (h) — q < f 2 (h) = 0.5. In this paper, the altitude h i>k 
of an agent is allowed to vary from 0 to oo for theoretic analysis, though it may not happen in the real 
world due to system limitations. 

Remark 1. Model (9) is motivated by the natural understanding of the interaction between the altitude 
of an agent and its detection and false alarm probabilities. It only reflects the general relation between 
those parameters, and is not restricted to a specific parametric representation of fi and f 2 . Hence, 
our method is applicable for any detection probability function that fits for the model. An experimental 
detection probability model of CCD camera has been given in [22 ]. 

Remark 2. p itk and q,^ k can also be cell-dependent, i.e., they may vary from place to place due to 
environment conditions. For example, the target is often easier to be discriminated on an open ground 
than on a land with trees. In complex environments, agents must know the detection probability and false 
alarm probability models of different type of regions. For the ease of expression, we assume the models 
to be constant across the whole surveillance region. 

Denoting by m i g k the number of observations taken over cell g up to time k by agent i and defining 
m g,k — J2iL\ m i,g,k' we can § et me following conclusions for the update of Qi, g ,k- 

Theorem 1. Given the initial prior probability map 0 < 7\ s ,o < 1 Vi G V, if there exists a constant 
e > 0 such that pi, k ^ 0.5 + e and q i>k ^ 0.5 — e Vz G V, and the network topology Q k is connected at 
all times, the following conclusions hold by implementing the map updating rule (5) and (6). 

(1) If a target is present within cell g, Qi t9 , k -— -> — oo {i.e., Vi, g ,k — > 1) Vz G V as m g , k — > +oo. 

(2) If no target is present within cell g, Qi :9 , k — > +oo (i.e., Vi >g> k — > 0) Vz G V as m g , k — > +oo. 

Proof. See Appendix A. □ 

3.3. Environment-Based Probability Map 

In the map updates (5) and (6), the effect of the environmental changes such as the information 
decay and transmission between cells has not been considered. For example, if targets may randomly 
appear or disappear during search, the historical information about the target existence cannot reflect the 
true current situation and revisits of certain frequency to the detected cells are needed for information 
update. This problem can be formulated as the information decay for each cell. If a target may 
move from one cell to another, then part of the information for the former cell should be removed 
and counted as the new information for the latter cell. This problem can be formulated as the 
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information transmission between each two cells. Therefore, we need to generalize the aforementioned 
map updating model to be applicable to the case with such environmental changes. Similar to the 
assumption made in [16], we assume that Qi i9ik decays exponentially for each cell if there is no prior 
knowledge and/or no measurement information. The information transmission between cells due to 
target movement is modeled based on the transition of probabilities. In addition, the prior knowledge 
about the environmental change is taken as the system input. All these lead to the following generalized 
updating model for Qi, g ,k- 

Hi,g,k 6 ^ Q'i,g,r,kfoi,g,r,kQi,T,k—\ ^ ^j,g,k £,i,g,k 

^ ' (11) 

Qi,9,k = / j w i,j,kHj,g,k 

where a ^ 0 is the information decaying factor, T is the sampling period of all UAVs, ai, g , r ,k and 
bi, g ,r,k are the information transmission factors which are nonnegative, and is the input information 
vector given by the prior knowledge about the target existence within cell g. Specifically, bi^^ yk satisfies 

b iy g ig>k = 1 and b it9 ^ k = 0 (g ^ r) for Q^ r ,k-i > 0, and b i>g ^ k = P (o gik = 1 Q r ^ x = l) for 
Qi, r ,k-i < 0. a i<gt r tk is determined by a i>gt f uk = 1 and a iiSjr . jfc = 0 (r ^ fj), where 

fj = argmin b ijgirjk Q ijr , k -i 

reB iigik (12) 

Bi,g,k = {r e O : b hg ^ k > 0} 

Remark 3. a>i, g ,r,k and 6i lS , r ,fc are defined based on the physical meaning of information transmission due 
to the target movement in the real world. Since the combination ofQi yT ^ k (r G O) into a cell g involves the 
combination of historical measurement information of all cells rGO, the correlation of which may not 
be known, we need to be careful in dealing with the fusion of such information. IfQi^ jk > 0, we are more 
confident that no target exists within cell g. Otherwise, we are more confident that a target exists within 
cell g. Since an information transmission out of a cell at time k is expected to occur only when a target 
exists within the cell at time k — 1, we let b i g ^ k = 0 ifQi, r ,k-i > 0, which means there is no transmission 
of information (or target movement) from cell r to cell g. IfQi >r ,k-i ^ 0, an information transmission 
occurs from cell r to cell g due to the possible target movement from r to g and the amount of information 
transmitted should be proportional to P (o gi k = 1 &r,k-i — lj> z - e -> equal to 6i lff ,r,fc<5i,r,fe-i- The smaller 
bi, g ,r,k is, the less amount of information is retained for cell g. Furthermore, by assuming that within one 
cell there can only exist up to one target at a time, i.e., at most one target can move into a cell at a time, we 
select the information stream with the largest transmitted amount as the newly stored information for cell 
g when there are incoming information streams from multiple cells r G B 9jk . That is, to take the smallest 
bi,g,r,kQi,r,k-i subject to Qi >r k-i ^ 0 as the newly stored information after the transmission, which 
corresponds to the most probable target movement to g in all possible movements to g from different 
cells. The information decaying factor a is set to be positive in the case that the prior knowledge of 
bi, g ,r,k is not accurate or targets may appear and disappear unpredictably during the search. In this 
case, the information decay makes the agents revisit the detected regions at a certain frequency. As for 
the input information vector £, it only denotes the effect brought by the prior knowledge and there is no 
need to calculate it out in real implementations, because any prior knowledge on the target existence 



Sensors 2014, 14 



9416 



can be directly used to update the probabilities of target existence and thus update directly following its 
definition in Equation (2). 

Here we give a simple example to illustrate how the parameters are designed if the true target dynamic 
model is give by x k +\ = ^x k where %k is a vector including the target location. In this case, one can 
calculate the transition probability P (0 g ,k+i = M@r,k = 1) far any two cells r and g where 6 Tyk = 1 
represents that the target locates within cell r at time k. Following this, given the current accumulated 
information on target existence Qi tTlk of agent ifor cell r at time k, one can calculate &i, g ,r,fe following its 
definition. Further, with the results ofb^g^^for any two cells r and g, one can calculate cn, g , r ,k according 
to its definition in Equation (12). If the target will not suddenly disappear/appear, the decaying factor a 
can be set as zero. 

Define the following augmented variables: 



A k = diag [Ai ik , ■ ■ ■ , A N)k ] 

where r and s are respectively the row and column indices of an appropriate cell in A^ k , and M is the 
total number of cells, we get the following generalized updating model: 



where eg) denotes the Kronecker product. 

According to Theorem 1, ||Qfc|| can be seen as the gathered information for decision making on 
the target existence and the larger the ||Qfc||, the less the uncertainty about the target existence or 
nonexistence. Hence, our aim of controlling the UAVs is to maximize ||Qfc|| in some sense, which 
will be discussed in the following section. 

4. Cooperative Coverage Control 

In the previous section, a distributed map updating scheme was proposed for fusion of the knowledge 
of multiple agents. In this section, we will design a cooperative control strategy that optimizes the 
trajectories of agents for target search based on their real-time updated knowledge about the target 
information. Within each time interval, an agent first updates its probability map by the the map updating 
scheme designed in Section 3.3 and then makes a control decision on which place it should move to for 
the next observation by collective optimization which will be addressed in this section. The two steps 
make the whole network form a closed-loop sensing and feedback control system. 

Here we consider the waypoint motion model for each agent: 




Q fc = e~ aT (W k <g> J) A fc Q fc _! + (W k ® I) (V, + £ k ) 



(13) 



fJ-i,k — ^i,k-l + Ui lk 



(14) 
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where u i:k G IR 3 is the control input (or the waypoint displacement) of the i-th agent at time k. Note 
that the above motion model only deals with the waypoints of agents at discrete-time steps. The true 
dynamics of agents is not discussed in this paper since we do not want to limit our results on the dynamic 
model of any specific type of UAV. How to make the agents achieve the desired waypoints by their 
inner-loop flight controllers is a technical issue which will not be addressed in this paper but left to be 
solved in our real system experiments. Our job is to optimize the selection of the next waypoint (i.e., u ijk ) 
for each agent given its current waypoint (i.e., Ui t k-i)- 
Following Equation (13), we can get 

Q k = G k + {W k <g> I) V fc 

where G k = e~ aT (W k <g> I) A k Qk-i + (W k <8> I) £ fc . At time k — 1, G k can be seen as the prior 
information, and the information gathered from measurements. Since V k and E [V k ] are both related 
to the true target existence which is unknown, we cannot predict the values of Q fe or E [Q fc ] before taking 
measurements. What we can do at time k — 1 is to find the optimal next time sampling position jj, i)k so 
as to maximize the information to be gathered at time k. More precisely, the problem can be formulated 
as the optimization problem: 

(15) 

where [i k = [fij k , . . . , fijf k ] 1 . Considering that W k includes the global topological information which 
is often hard to obtain for each individual agent in a distributed system, and ||(Wjfc <S> /) V fe || ^ ||V&||, 
we replace Equation (15) with the following suboptimal optimization: 

N 

maxE [||V fe || 2 ] = EE E E fe.J (16) 
geo i=l j£JVi, k 

Notice that Equation (16) is not an approximation of Equation (15), but a new cost function we intend 
to optimize which is an upper bound of Equation (16). Such way of defining the cost function is very 
common in statistics and estimation theory such as the Cramer-Rao lower bound, which is often selected 
as the cost function if the true variance of estimation error is time- varying and unknown. 

Following Equation (16), we should try to maximize E \v 2 - g fc ] and the collective sensing area of all 
agents. From Equation (4), we get for g G Cj ;k : 

HO g , k = l 

2 

otherwise 

It is straightforward to find that E is monotonically increasing with respect to pj >k and 

monotonically decreasing with respect to qj ik no matter 6 g k = 1 or not. Thus, Equation (16) is further 
replaced with the following optimization problem: 

N 

max'HOfc) = E / 4>k(r)(p itk -q i<k )lr c idr (17) 

where <\> k is a given nonnegative weighting function of r G O at time k, and its influence on the control 
law will be shown later. {M ljfc , ... M^^} is a partition of O at time k subject to fx i:k G M ijfc , such as the 



maxE 

Uk 



|Qa — Gjfell Qfe-i, £, 



p j> k ln Si + - p j> k "> ln 



-Jljjh 



1j,k 



ln £ + ( 1 -^) ln r 



-Pj,k 
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Voronoi partition. The introduction of the partition is for avoidance of collision between UAVs and ease 
of dealing with the overlapped sensing regions between neighboring agents, which will be discussed 
later. Since p iik ^ q ijk , % (fi k ) 1S always nonnegative and % (fit.) = 0 if h ijk ^ h. Denoting by d (•) the 
boundary of the corresponding region and ngu) (r) the outward pointing normal vector of the boundary 
d (•) at point r, we can compute the gradient of % (fi k ) as follows. 

Theorem 2. The gradient of the cost function % (fi k ) with respect to fi itk (h^ k < h) is given by 
dH (fi k ) 



dH (fi k ) 



i.k 

for hi G (h, h), and 



(Pi,k ~ %k) / <Pk (r) n S 2 (r) dr 
. v (fi(hi,k)-f2(Kk)) <j)k(r)dr + (p i)k -q Lk )tan(p / (j) k (r)dr 



(18) 



dH (fik) 



(P — <?) t an V 9 / 4>k (f) dr 
is 2 



(19) 



9h i>k 

for hi* e (°> h), where c i>0 G O, h i>0 G (0, h), Sj = M i)fc f| Q,fc S? = M Jjfe f| 9 

Proof. See Appendix B. □ 

Following Theorem 2, a gradient-based control law is given by 

dH (fi k ) 



Hi k Km 



dfi. 



i k 



(20) 



Mfc=Mfe-i 



where K u is a positive gain parameter. A larger K u may lead to faster convergence of to the sub-optimal 
configuration, but may also cause larger convergence error or oscillation around the settle points due to 
the discrete-time control. In real system implementations, users should choose the parameter by trading 
off the two performance indices. 

Remark 4. Note that the control input is always upper bounded in real systems, i.e., ^ u max for 

some positive number u max and the height of each agent is also bounded by h itk < hto have meaningful 
detections. Moreover, to avoid collision when neighboring agents are at the same altitude, the motion of 
each agent should be constrained by c^+i G M.^ k . Therefore, the control law (20) is modified as follows 
to be adapted to the constraints: 



K m(fi k ) 



dfi it t 



(21) 
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where is a scaling factor defined by 

Xi t k = argmax 



s.t. 



on (ji k ) 



XK, 



on M 



dp,. 



i k 



U r\ 

OfJ>i,k 
Umax 



c ijk -i + XkJ^^- e M 4ife _ 1 \A, fc -i 
oc iik 

Kk-i + XK U —- < h - e 2 



(22) 



i k 



Ajfc w « buffer region enclosing the border ofM.^ defined as follows: 



A,: 



reM ti : mm ( ei, mm \\a, k - 



< mm \\r-s\ 



(23) 



where ei, e 2 > 0 are ,g/ven parameters for limiting the width of the buffer region and the height of each 
agent respectively. 

Generally, it is favored that UAVs stay longer in the region with less gathered information to take more 
measurements. Thus, we define the weighting function <pi^ (g) as a function of the gathered information 

||Qi, 9 ,fc-i|| for each cell, i.e.: 

i~K<t> Qi,g,k — l 



<t>i,k (g) 



(24) 



where K$ is a positive gain parameter. By this model, cells with less gathered information are given 
higher weights for detection. There is no specific rule for choosing the optimal since it only denotes 
the user's preference on the search priority for different cells. In general, is only required not to be 
too large or too small in order to properly scale the weights of different cells. For example, we find that 
= 2 is one of the many suitable settings in our simulation. 

Remark 5. The partition {Mi ^, . . . Mjv,fc} can be static or time-varying. Partition is commonly used 
where each UAV only takes charge of one part of the whole surveillance region so that the whole 
searching task is shared by multiple agents. Users can predefine the task regions for each UAV or let the 
UAVs dynamically compute the partition following some rules. An example of the dynamic partition is 
the Voronoi partition which has been widely used in the distributed control [13]. 

5. Simulation 



5.1. Simulation Environment 

We deploy multiple UAVs to search for four ground targets. The whole surveillance region is a square 
region of [0, 50] x [0, 50] m 2 as shown in Figure 2a, within which lie two crossing roads denoted by 
Or C O. The four targets stay or move only on the roads and no target appears outside the roads in 
the surveillance region. At time k, each target z (z = 1, 2, 3, 4) randomly moves to one of the cells in 
the set {g G 0_r : \\g — Tar 2)fc _i|| ^ V Tar } where Tar 2jfc _i is the cell it stays in at time k — 1 and V Ta r is 



the largest possible speed of target movement. Hence, P ( 9 g ^ k 



r,fc-l 



1 
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r G Or, where £> r = {g e O fi : ||gr — r|| < V Tar }. Initially, we set Qi, gt o = 0 for all i and g within roads 
(i.e., Vi t g t k = 0.5 for g e Ok), and Qi, 9 ,o to a fixed large value for g outside the roads (i.e., Vi j9; k ~ 0 
for g ^ Or). The detection probability function and the false alarm probability function are assumed 
to be fx (h hk ) = tf ie -*»K*-*) a , and f 2 (h hk ) = K z e K ^ K ^ respectively where K lt K 2 , K 3 , K A 
are positive parameters satisfying the conditions in Equations (9) and (10). We test the proposed target 
search method in two scenarios. In Scenario I, all targets appear at k = 0 and keep stationary during 
the whole searching process, i.e., V Tar = 0 m/s. In Scenario II, we set V Tar = 1 m/s to test the influence 
of target mobility on the convergence of probability maps. The four targets also appear at k = 0 but 
do not disappear during the search. In these two scenarios, we verify the effectiveness of the proposed 
target search method by deploying different number of UAVs and using different information decaying 
factors. The initial positions of UAVs are randomly selected within region [0,5] 3 m 3 . The partition 
{M l fc , . . . Mtv^} is generated by Voronoi partition. The communication range is set as R c = 20 m 
and the communication control protocol in [11] is applied for connectivity maintenance. A distributed 
fT-connectivity maintenance algorithm has also been developed by the authors in [23] which can be 
applied in cooperative target search. Readers may refer to the references for more details on the 
communication protocols or maintenance algorithms. The cell size is fixed as 1 x 1 m 2 Other key 
parameters are respectively set as K u = 0.3, = 2, q = 0.1, p = 0.99, q = 0.01, h = 10 m, h = 5 m, 
a = 0, u max = 2 m/s and T = 1 s. 

Figure 2. The convergence of the probability map of an agent in Scenario I. (a) k = 0 s; 
(b) k = 10 s; (c) k = 30 s; (d) k = 50 s; (e) k = 70 s; (f) k = 90 s. 
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Since the convergence of the individual probability map Vi 9) k of agent i implies that the weight 
<j>i,k (g) defined by Equation (24) approaches 0 for each cell, we define the following average weight to 
evaluate the convergence performance of the whole network: 



NM, 



R 



N 

EE 

i=l g£<0 R 



4>i,k {g) 



where Mr denotes the total number of cells within the roads. It is easy to find that the initial value of 



0fc is 0c 



--fQ>||Qj l9 ,o|| 



1. In the simulations, we compare the results of 0 fc with 



'0 NM R ^i=l ^jEOfl 

different system parameters. The results are averaged from 200 Monte Carlo simulations. 



5.2. Simulation Results 



Figure 2 shows an example of the convergence process of individual probability maps in Scenario I 
with stationary targets, where the probabilities converge to 1 for the cells within which targets truly 
exist and 0 for the cells within which no target exists. The snapshots of UAVs in Scenario I are shown in 
Figure 3. Additionally, <fi k finally converges to 0 and the more agents are deployed, the faster it converges 
as shown in Figure 4a. 



Figure 3. Snapshots of UAVs in Scenario I. (a) k 
(d) k = 50 s; (e) k = 70 s; (f) k = 90 s. 



0 s; (b) k = 10 s; (c) k = 30 s; 




(a) 



(b) 



(c) 




(d) 



(e) 



(f) 



The convergence process of an individual probability map in Scenario II with mobile targets is shown 
in Figure 5, where the probabilities for the cells around targets may not converge to 0 as in Scenario I 
due to the random mobility of targets. However, we still can infer that there are four targets on the roads 
and have a rough estimation of their positions based on the envelopes of the final probability maps of 
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UAVs. The snapshots of UAVs in Scenario II at according times are shown in Figure 6. In this case, 4> k 
does not converge to 0 as shown in Figure 4b. However, a smaller <f> k can be obtained with more agents 
deployed since the collective sensing area becomes larger. Compared with the results in Scenario I, the 
number of deployed agents has a greater impact on the convergence performance of probability maps in 
Scenario II with random target mobility. Hence, the algorithm is more robust with more UAVs deployed. 



Figure 4. Weight average 4> k by different number of agents, (a) Scenario I; (b) Scenario II. 




Figure 5. The convergence of the probability map of an agent in Scenario II. (a) k = 0 s; 

(b) k = 10 s; (c) k = 30 s; (d) k = 50 s; (e) k = 70 s; (f) k = 90 s. 




(d) (e) (f) 
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Figure 6. Snapshots of UAVs in Scenario II. (a) k = 0 s; (b) k = 10 s; (c) /c = 30 s; 
(d) k = 50 s; (e) fc = 70 s; (f) k = 90 s. 





(d) 



(e) 




(f) 



Figure 7. Weight average <p k by different information decaying factor, (a) Scenario I; 
(b) Scenario II. 
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In addition, we also test the impact of the information decaying factor on the convergence results. 
According to the simulation results (as shown in Figure 7), a larger decaying factor will lead to larger 
average uncertainty about the target existence in the whole region, because the accumulated information 
for each cell decays faster. In fact, the design purpose of the decaying factor is to let agents revisit each 
cell at certain frequency to update the latest information about target existence in the cell. Therefore, the 
tradeoff lies in that, a larger decaying factor leads to larger uncertainty, but makes the agents pay more 



Sensors 2014, 14 



9424 



attention to the cells with fewer observations. However, there is no quantitative means of choosing the 
decaying factor and users may find a proper one via simulation method. 

6. Conclusions 

In this paper, we studied the three-dimensional vision-based cooperative control and information 
fusion in target search by a group of UAVs with limited sensing and communication capabilities. 
First, heuristic detection probability and false alarm probability models were built which are related 
to the target discriminability of a camera and varies as a function of altitude. Then, we formulated 
the target search problem as a coverage optimization problem by balancing the coverage area and the 
detection performance. A generalized probability map updating model was proposed, by considering 
the information decay and transmission due to environmental changes such as the target movement. The 
simulation results showed that the proposed algorithms can make the individual probability maps of all 
agents converge to the same one which reflects the true environment when the targets are stationary. The 
influence of target mobility and the number of deployed UAVs on the convergence of probability maps 
has also been illustrated by simulation. Following this work, there is still a big potential area for future 
development and generalization of the proposed method. For example, the extension for detection by 
heterogenous sensors is an interesting topic since more types of information can be combined to improve 
the detection performance. More realistic environmental and system conditions that can affect the search 
results need to be considered such as the light intensity, block on the line of sight, camera with adjustable 
focus, asynchronous communication, etc. 
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Appendix A. Proof of Theorem 1 

First, we consider Case 1, where a target is present within cell g. Define the following augmented 
notations for the entire system: 



Yg,k — [Ql,g,k> Q2,g,hi • • • j QN,g,k\ 
&g,k — [ v l,g,k, v 2,g,k, • • • j ^7V,g,fc] T 



Then, the updating rule (5) and (6) can be replaced by the following equation: 



k 



k 



k 




(25) 



i=i t=i 
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Hence, 



E 



N 



Qi,g,k 



1=1 



N k N 

E E + EE E E 

4=1 i=i jeA/"i,i 



i=l 



where 



E [ V j,9,t] 



Pi,t J- 



and 1 



is the indicator function defined as: 



1 tfgeC jjt 
0 otherwise 



1 {9GC J , t } : 

Since Qi, g fl, Vj ltg>Vl and Vj 2ig>V2 are independent for ji ^ j 2 or 771 7^ 772, we can get the variance 



D 



N 



i=l 



JV fc JV 

E D + EE E D 

4=1 t=l J6JV M 



i=l 



where 



D = />,, (1 - P i>t ) (in ^ - In I-** ] 1 



{ff6Cj, t } 



Considering that D is a continuous function of Pj >t and 1 > p ^ ^ 0.5 + e > 0.5, 

0 < q ^ ^ 0.5 — e < 0.5, there exists a constant real number cr = 2^Jp (0.5 — e) In such that 
D [vj j9} t] < a 2 for g G Cj )t , which implies 



fc JV 



nm yy y 



D[v 



j,g,t\ 



t=l i=l j£Afi,t 9,t 



lim 7 



'(J 



1=1 



< OO 



According to the Kolmogorov Strong Law of Large Numbers, [24] we get 



Et=i Ei=i EjeM t E [ v j,g,t] a .: 



m 9,t 



0, as m g j — > +00 



Hence, 



Ei=i Qi,g,k _ Sill Qi,g,0 , Et=i Ei=i EjeA/^ E fog^] 



+ 



Eti Eii EieM, t ( v i,9,t ~ E [« irf ,, t ]) 



a .s. E*=i Ei=i EjeM.t E [ u i 



as m Si t — > +00 



On the other hand, since pj t > qj t , it is straightforward to get 



8E\v 



J,9,t\ 



dp 
dE[v 
dq 



'j,t 



In** 


-in 1 - 


- Pj,t 


Qj,t 


1 - 


- Qj,t 


Pit _ 


1 - Pj,t 


> 0 


Qj,t 


1 - Qj,t 





< 0 
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which implies 

, 0.5 -e 
EK Sj ,]<2£ln^ Ti <0 

Hence, 

l imsup S^M^ hm sup ^ E ^' 3 ' <] = p In % + (1 - p) In j— ^ < 0 

m Sik ->+oo m 9,A; m g>fe ->+oo m s,t P 1 — p 

which implies YuiLi Qi, g ,k — oo. Since the network is connected all the time, Qi t9> k for each agent 
i will almost surely converge to — oo by implementing the average consensus protocol (6) (as shown 
in [11,25]). 

Following the same procedure of the proof above, we can prove the conclusion of Case 2. 

Appendix B. Proof of Theorem 2 

We first consider h iik 6 {h,h). From Equation (17), we get 

N » N . 

i= i J®k,k 1 l ' ; i= i Mifi^t 

Defining a set of agents for agent i, i.e.,J\fi jk = {j : 9 (ML,-^) Q,* 7^ 0}, it follows that 

'" - 1 <fik (r) I - )dr + (f) k (r) (p i;k - q i)k ) n S 2 (r) dr 



<9r 1 



+ 5^ / 0fc (r) (pi, fc - ft,*) 
+ <f>k (r) (pi ik - 



<9r T 



n§3 (r) dr 



n§4 (r) dr 



where 8} 4 M^flQ,,, Sf 4 M ^ f| 0 (Q.fe). = (9 (M iiifc ) \<9 {0))[)C i>k , S?j 4 

(9 (Mj- fc ) D 5 (O)) D Ci >fc . For r e f| 0'i, J2 G and ^ ^ j 2 ), we have i 
For r e S, 2 ,- 0 G M fc X we have 7^- T = 0. Hence, 



■ n§3 h \r) = -ng| (rj 

9r T 



0fc ( r ) ( ~ ) dr + / 0 fc (r) (p ijfc - gi ifc ) ra §2 (r) dr 



From Equation (9) and according to the results of [15], we get 

%,Jt n dp i)k dq i)k , , , 

o _ U ' ol _ h \ n i,k) , ^7 — h { n i,k 

oc iM dh ijk oh i:k 

and 

dr T dr T 

% 2 0) = % 2 ( r ) 



(9r T 

— — n §2 (r) = tan <p 
dh ijk 
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where (p is half of the angle width of the field of view of each agent (as shown in Figure lb). Therefore, 
Equation (18) holds. 

According to Equation (9), p i: k = p for h i k E (0, h). It is straightforward to get Equation (19) 
following the same procedure of proof as above. 
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